6 research outputs found
The Change You Want to See (Now in 3D)
The goal of this paper is to detect what has changed, if anything, between
two "in the wild" images of the same 3D scene acquired from different camera
positions and at different temporal instances. The open-set nature of this
problem, occlusions/dis-occlusions due to the shift in viewpoint, and the lack
of suitable training datasets, presents substantial challenges in devising a
solution.
To address this problem, we contribute a change detection model that is
trained entirely on synthetic data and is class-agnostic, yet it is performant
out-of-the-box on real world images without requiring fine-tuning. Our solution
entails a "register and difference" approach that leverages self-supervised
frozen embeddings and feature differences, which allows the model to generalise
to a wide variety of scenes and domains. The model is able to operate directly
on two RGB images, without requiring access to ground truth camera intrinsics,
extrinsics, depth maps, point clouds, or additional before-after images.
Finally, we collect and release a new evaluation dataset consisting of
real-world image pairs with human-annotated differences and demonstrate the
efficacy of our method. The code, datasets and pre-trained model can be found
at: https://github.com/ragavsachdeva/CYWS-3
ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning
In this paper, we address the problem of training deep neural networks in the
presence of severe label noise. Our proposed training algorithm ScanMix,
combines semantic clustering with semi-supervised learning (SSL) to improve the
feature representations and enable an accurate identification of noisy samples,
even in severe label noise scenarios. To be specific, ScanMix is designed based
on the expectation maximisation (EM) framework, where the E-step estimates the
value of a latent variable to cluster the training images based on their
appearance representations and classification results, and the M-step optimises
the SSL classification and learns effective feature representations via
semantic clustering. In our evaluations, we show state-of-the-art results on
standard benchmarks for symmetric, asymmetric and semantic label noise on
CIFAR-10 and CIFAR-100, as well as large scale real label noise on WebVision.
Most notably, for the benchmarks contaminated with large noise rates (80% and
above), our results are up to 27% better than the related work. The code is
available at https://github.com/ragavsachdeva/ScanMix
LongReMix: Robust Learning with High Confidence Samples in a Noisy Label Environment
Deep neural network models are robust to a limited amount of label noise, but
their ability to memorise noisy labels in high noise rate problems is still an
open issue. The most competitive noisy-label learning algorithms rely on a
2-stage process comprising an unsupervised learning to classify training
samples as clean or noisy, followed by a semi-supervised learning that
minimises the empirical vicinal risk (EVR) using a labelled set formed by
samples classified as clean, and an unlabelled set with samples classified as
noisy. In this paper, we hypothesise that the generalisation of such 2-stage
noisy-label learning methods depends on the precision of the unsupervised
classifier and the size of the training set to minimise the EVR. We empirically
validate these two hypotheses and propose the new 2-stage noisy-label training
algorithm LongReMix. We test LongReMix on the noisy-label benchmarks CIFAR-10,
CIFAR-100, WebVision, Clothing1M, and Food101-N. The results show that our
LongReMix generalises better than competing approaches, particularly in high
label noise problems. Furthermore, our approach achieves state-of-the-art
performance in most datasets. The code will be available upon paper acceptance
The change you want to see
We live in a dynamic world where things change all the
time. Given two images of the same scene, being able
to automatically detect the changes in them has practical
applications in a variety of domains. In this paper, we
tackle the change detection problem with the goal of detecting “object-level” changes in an image pair despite
differences in their viewpoint and illumination. To this
end, we make the following four contributions: (i) we propose a scalable methodology for obtaining a large-scale
change detection training dataset by leveraging existing
object segmentation benchmarks; (ii) we introduce a coattention based novel architecture that is able to implicitly determine correspondences between an image pair and
find changes in the form of bounding box predictions; (iii)
we contribute four evaluation datasets that cover a variety of domains and transformations, including synthetic image changes, real surveillance images of a 3D scene, and
synthetic 3D scenes with camera motion; (iv) we evaluate our model on these four datasets and demonstrate
zero-shot and beyond training transformation generalization. The code, datasets and pre-trained model can be found
at our project page: https://www.robots.ox.ac.
uk/˜vgg/research/cyws/